Prepared by: Clayton Miller ETHZ
April 22, 2013
The goal of this notebook is to introduce a dataset of 2 years worth of time series measurements from an international school campus in Singapore. This data is part of a research project focused on utilizing data analytics to reduce the energy impact of buildings. Data analysis is meant to distill the various raw data sources sometimes found in building controls and energy management systems.
First, a basic introduction to the domain of building cooling systems: Below, in Figure 1, is a simplified diagram of a building cooling system common to large commercial buildings. The total system has many machines working as a system to deliver cold air to each of the zones in the building. There are two key subsystems:
Waterside system - the chilled water plant which produces cold water and distributes it throughout the campus; this water is used to cool air at the Air Handling Units (AHUs)
Airside system - the components after the chilled water coil in the air handling unit all the way to the zone variable air volume (VAV) box
#Figure 1
from IPython.core.display import Image
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/Simplified Cooling System Diagram.jpg')
The building that will be focused on in this study is the United World College Campus in Singapore. It is a 70,000 sq.meter International School in which construction was completed in June 2011 and students first utilized the facility in September of that year. It is a certified Greemark Platinum building which means it is at the top of building ratings in terms of energy efficiency and sustainability.
The facility is extraordinary from a data analysis standpoint as it has a few extra sensors meansuring key energy information and that it is trending and storing much of the data generated by the building controls system. This is much different than a conventional commercial building which hardly ever stores any of the data from its controls systems. The main reason behind this is that operations staff find that too much raw data is unuseful and confusing.
UWC is different in that they would like to store and manage as much data as possible and look to researchers to help then get value from this repository. We will focus on UWC's cooling system as it is approximately 65% of all the energy consumed on campus as year-round cooling and dehumidification are required in Singapore's climate. UWC spends approximately SGD $75,000 (40,000 Euros)/month on cooling energy consumption!
Figure 2 is a diagram of the UWC campus that points out the main 'Blocks' or buildings on campus:
Please note that the Education Block 1 has been renamed to Blocks B, C, and D and Education Block 2 has been renamed to Block A
#Figure 2 - Campus Map
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/campusmap.jpeg')
In terms of the cooling system components, Figure 3 below shows a diagram of the Chilled Water Plant, Weather Stations, AHU's, and VAV's as they exist in the system. The diagram also includes the types of granular data available from each piece of equipment.
#Figure 3 - Cooling System Diagram for UWC
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/Cooling System Equipment Heirarchy.jpg')
In this section, I will give an idea of what I'd like to do with the data. Figure 4 below gives an overview of the data sources and an importance of each in terms of the goals. We can go into more discussion about these goals on our introduction call.
#Figure 4 - Goals Overview
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/KaggleGoals.png')
Finally, I will go through and describe the datasets and show a quick visualization of each.
Data can be found at: https://www.dropbox.com/sh/kap9dr21idkk51x/wQYgnMie6E
cd ~/Dropbox/03-ETH/91-UWC/KaggleConsulting/Data/
ls
The highest level chilled water plant performance data. Almost completely gapless 2 years of 1 min interval data.
import pandas as pd
PrimarySystemMetrics = pd.read_csv('PrimarySystemMetrics.csv', parse_dates=True, index_col='0_1')
PrimarySystemMetrics
Resample to hourly and plot only 2013 so far:
PrimarySystemMetrics.resample('H').truncate(before='2013-02-01').plot(subplots=True, figsize=(25,30))
This is supplemental data for the chilled water system as a whole. This data (and the rest of the granular data) is collected by a different system which is much less reliable therefore there are many more gaps in the datasets.
ChillerPlantSystemData = pd.read_csv('ChillerPlantSystem.csv', parse_dates=True, index_col='timestamp')
ChillerPlantSystemData
ChillerPlantSystemData.resample('H').truncate(before='2013-01-01').plot(subplots=True, figsize=(25,30))
2013 Data is good but 2011-2012 has a few gaps:
ChillerPlantSystemData.resample('H').truncate(after='2013-01-01').plot(subplots=True, figsize=(25,30))
There is more detailed Chiller Plant Equipment data which gives more details on the specific pieces of equipment:
ChillerEquipData = pd.read_csv('ChillerPlantEquip.csv', index_col='timestamp', parse_dates=True)
ChillerEquipData
The Air Handling Unit data can serve as an indicator fo system operation and load profiling. There is a lot of good data from the airside to indicate what % of the building is occupied at any given point in time. This is important as occupancy has a significant factor on consumption.
Airsidedata = pd.read_csv('Airside.csv', index_col='timestamp', parse_dates=True)
Airsidedata
You'll notice there is quite a few more pieces of equipment and therefore points in this dataset. When we plot, it looks like a mess:
Airsidedata.resample('D').plot(figsize=(25,30),legend=False)
Due to this amount of data, we'll use a 'mapping set' to help us navigate the data:
DataPointMapping = pd.read_csv('UWC Cooling System Point List MetaData - UWCPointList.csv')
DataPointMapping.head()
Let's use the mapping set to query our Airside dataframe:
CO2SensorData = Airsidedata[(DataPointMapping.PointName[(DataPointMapping.PointTag == 'CO2Sensor')])]
CO2SensorData
CO2SensorData.resample('D').plot(figsize=(25,15),ylim=(0,1000))
The VAV data is essentially the room conditions in each of the air conditioned zones. This is essentially the thermostat you see on the wall in each room and the amount of air coming in through the diffuser in the ceilling. I think this information is quite valuable in terms of understanding performance from a thermal comfort perspective.
VAVZoneTemp = pd.read_csv('VAVZoneTemp.csv',index_col='timestamp',parse_dates=True)
VAVZoneTemp
VAVZoneTemp.resample('D').plot(figsize=(30,25),legend=False, ylim=(15,35))